length ratio
RegexPSPACE: A Benchmark for Evaluating LLM Reasoning on PSPACE-complete Regex Problems
Jin, Hyundong, Hahn, Joonghyuk, Han, Yo-Sub
Large language models (LLMs) show strong performance across natural language processing (NLP), mathematical reasoning, and programming, and recent large reasoning models (LRMs) further emphasize explicit reasoning. Yet their computational limits, particularly spatial complexity constrained by finite context windows, remain poorly understood. While recent works often focus on problems within the NP complexity class, we push the boundary by introducing a novel benchmark grounded in two PSPACE-complete regular expression (regex) problems: equivalence decision (RegexEQ) and minimization (RegexMin). PSPACE-complete problems serve as a more rigorous standard for assessing computational capacity, as their solutions require massive search space exploration. We perform a double-exponential space exploration to construct a labeled dataset of over a million regex instances with a sound filtering process to build the benchmark. We conduct extensive evaluations on 6 LLMs and 5 LRMs of varying scales, revealing common failure patterns such as verbosity and repetition. With its well-defined structure and quantitative evaluation metrics, this work presents the first empirical investigation into the spatial computational limitations of LLMs and LRMs, offering a new framework for evaluating their advanced reasoning capabilities. Our code is available at https://github.com/hyundong98/RegexPSPACE .
Prompting LLMs: Length Control for Isometric Machine Translation
Javorský, Dávid, Bojar, Ondřej, Yvon, François
In this study, we explore the effectiveness of isometric machine translation across multiple language pairs (En$\to$De, En$\to$Fr, and En$\to$Es) under the conditions of the IWSLT Isometric Shared Task 2022. Using eight open-source large language models (LLMs) of varying sizes, we investigate how different prompting strategies, varying numbers of few-shot examples, and demonstration selection influence translation quality and length control. We discover that the phrasing of instructions, when aligned with the properties of the provided demonstrations, plays a crucial role in controlling the output length. Our experiments show that LLMs tend to produce shorter translations only when presented with extreme examples, while isometric demonstrations often lead to the models disregarding length constraints. While few-shot prompting generally enhances translation quality, further improvements are marginal across 5, 10, and 20-shot settings. Finally, considering multiple outputs allows to notably improve overall tradeoff between the length and quality, yielding state-of-the-art performance for some language pairs.
Length Aware Speech Translation for Video Dubbing
Chadha, Harveen Singh, Subramanian, Aswin Shanmugam, Joshi, Vikas, Bansal, Shubham, Xue, Jian, Mehta, Rupeshkumar, Li, Jinyu
In video dubbing, aligning translated audio with the source audio is a significant challenge. Our focus is on achieving this efficiently, tailored for real-time, on-device video dubbing scenarios. We developed a phoneme-based end-to-end length-sensitive speech translation (LSST) model, which generates translations of varying lengths--short, normal, and long--using predefined tags. Additionally, we introduced length-aware beam search (LABS), an efficient approach to generate translations of different lengths in a single decoding pass. This approach maintained comparable BLEU scores compared to a baseline without length awareness while significantly enhancing synchronization quality between source and target audio, achieving a mean opinion score (MOS) gain of 0.34 for Spanish and 0.65 for Korean, respectively.
The Robot of Theseus: A modular robotic testbed for legged locomotion
Urs, Karthik, Carlson, Jessica, Manohar, Aditya Srinivas, Rakowiecki, Michael, Alkayyali, Abdulhadi, Saunders, John E., Tulbah, Faris, Moore, Talia Y.
Robotic models are useful for independently varying specific features, but most quadrupedal robots differ so greatly from animal morphologies that they have minimal biomechanical relevance. Commercially available quadrupedal robots are also prohibitively expensive for biological research programs and difficult to customize. Here, we present a low-cost quadrupedal robot with modular legs that can match a wide range of animal morphologies for biomechanical hypothesis testing. The Robot Of Theseus (TROT) costs approximately $4000 to build out of 3D printed parts and standard off-the-shelf supplies. Each limb consists of 2 or 3 rigid links; the proximal joint can be rotated to become a knee or elbow. Telescoping mechanisms vary the length of each limb link. The open-source software accommodates user-defined gaits and morphology changes. Effective leg length, or crouch, is determined by the four-bar linkage actuating each joint. The backdrivable motors can vary virtual spring stiffness and range of motion. Full descriptions of the TROT hardware and software are freely available online. We demonstrate the use of TROT to compare locomotion among extant, extinct, and theoretical morphologies. In addition to biomechanical hypothesis testing, we envision a variety of different applications for this low-cost, modular, legged robotic platform, including developing novel control strategies, clearing land mines, or remote exploration. All CAD and code is available for download on the TROT project page.
ViT-VS: On the Applicability of Pretrained Vision Transformer Features for Generalizable Visual Servoing
Scherl, Alessandro, Thalhammer, Stefan, Neuberger, Bernhard, Wöber, Wilfried, Gracía-Rodríguez, José
Visual servoing enables robots to precisely position their end-effector relative to a target object. While classical methods rely on hand-crafted features and thus are universally applicable without task-specific training, they often struggle with occlusions and environmental variations, whereas learning-based approaches improve robustness but typically require extensive training. We present a visual servoing approach that leverages pretrained vision transformers for semantic feature extraction, combining the advantages of both paradigms while also being able to generalize beyond the provided sample. Our approach achieves full convergence in unperturbed scenarios and surpasses classical image-based visual servoing by up to 31.2\% relative improvement in perturbed scenarios. Even the convergence rates of learning-based methods are matched despite requiring no task- or object-specific training. Real-world evaluations confirm robust performance in end-effector positioning, industrial box manipulation, and grasping of unseen objects using only a reference from the same category. Our code and simulation environment are available at: https://alessandroscherl.github.io/ViT-VS/
IKUN for WMT24 General MT Task: LLMs Are here for Multilingual Machine Translation
Liao, Baohao, Herold, Christian, Khadivi, Shahram, Monz, Christof
This paper introduces two multilingual systems, IKUN and IKUN-C, developed for the general machine translation task in WMT24. IKUN and IKUN-C represent an open system and a constrained system, respectively, built on Llama-3-8b and Mistral-7B-v0.3. Both systems are designed to handle all 11 language directions using a single model. According to automatic evaluation metrics, IKUN-C achieved 6 first-place and 3 second-place finishes among all constrained systems, while IKUN secured 1 first-place and 2 second-place finishes across both open and constrained systems. These encouraging results suggest that large language models (LLMs) are nearing the level of proficiency required for effective multilingual machine translation. The systems are based on a two-stage approach: first, continuous pre-training on monolingual data in 10 languages, followed by fine-tuning on high-quality parallel data for 11 language directions. The primary difference between IKUN and IKUN-C lies in their monolingual pre-training strategy. IKUN-C is pre-trained using constrained monolingual data, whereas IKUN leverages monolingual data from the OSCAR dataset. In the second phase, both systems are fine-tuned on parallel data sourced from NTREX, Flores, and WMT16-23 for all 11 language pairs.
Prior Constraints-based Reward Model Training for Aligning Large Language Models
Zhou, Hang, Wang, Chenglong, Hu, Yimin, Xiao, Tong, Zhang, Chunliang, Zhu, Jingbo
Reinforcement learning with human feedback for aligning large language models (LLMs) trains a reward model typically using ranking loss with comparison pairs. However, the training procedure suffers from an inherent problem: the uncontrolled scaling of reward scores during reinforcement learning due to the lack of constraints while training the reward model. This paper proposes a Prior Constraints-based Reward Model (PCRM) training method to mitigate this problem. PCRM incorporates prior constraints--specifically, length ratio and cosine similarity between outputs of each comparison pair--during reward model training to regulate optimization magnitude and control score margins. We comprehensively evaluate PCRM by examining its rank correlation with human preferences and its effectiveness in aligning LLMs via RL. Experimental results demonstrate that PCRM significantly improves alignment performance by effectively constraining reward score scaling. As another bonus, our method is easily integrated into arbitrary rank-based alignment methods, such as direct preference optimization, and can yield consistent improvement. The code is available at https://github.com/wangclnlp/
A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages
Arnett, Catherine, Chang, Tyler A., Bergen, Benjamin K.
How should text dataset sizes be compared across languages? Even for content-matched (parallel) corpora, UTF-8 encoded text can require a dramatically different number of bytes for different languages. In our work, we define the byte premium between two languages as the ratio of bytes used to encode content-matched text in those languages. We compute byte premiums for 1155 languages, and we use linear regressions to estimate byte premiums for other languages. We release a tool to obtain byte premiums for any two languages, enabling comparisons of dataset sizes across languages for more equitable multilingual model development and data practices.
On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models
Wang, Jiongxiao, Wu, Junlin, Chen, Muhao, Vorobeychik, Yevgeniy, Xiao, Chaowei
Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment. Despite its advantages, RLHF relies on human annotators to rank the text, which can introduce potential security vulnerabilities if any adversarial annotator (i.e., attackers) manipulates the ranking score by up-ranking any malicious text to steer the LLM adversarially. To assess the red-teaming of RLHF against human preference data poisoning, we propose RankPoison, a poisoning attack method on candidates' selection of preference rank flipping to reach certain malicious behaviors (e.g., generating longer sequences, which can increase the computational cost). With poisoned dataset generated by RankPoison, we can perform poisoning attacks on LLMs to generate longer tokens without hurting the original safety alignment performance. Moreover, applying RankPoison, we also successfully implement a backdoor attack where LLMs can generate longer answers under questions with the trigger word. Our findings highlight critical security challenges in RLHF, underscoring the necessity for more robust alignment methods for LLMs.
Designing a Magnetic Micro-Robot for Transporting Filamentous Microcargo
In recent years, the medical industry has witnessed a growing interest in minimally invasive procedures, with magnetic microrobots emerging as a promising approach. These micro-robots possess the ability to navigate through various media, including viscoelastic and non-Newtonian fluids, enabling targeted drug delivery and medical interventions. Many current designs, inspired by micro-swimmers in biological systems like bacteria and sperm, employ a contact-based method for transporting a payload. Adhesion between the cargo and the carrier can make release at the target site problematic. In this project, our primary objective was to explore the potential of a helical micro-robot for non-contact drug or cargo delivery. We conducted a comprehensive study on the shape and geometrical parameters of the helical microrobot, specifically focusing on its capability to transport passive filaments. Based on our analysis, we propose a novel design consisting of three sections with alternating handedness, including two pulling and one pushing microhelices, to enhance the capture and transport of passive filaments in Newtonian fluids using a non-contact approach. We then simulated the process of capturing and transporting the passive filament, and tested the functionality of the newly designed micro-robot. Our findings offer valuable insights into the physics of helical micro-robots and their potential for medical procedures and drug delivery. Furthermore, the proposed non-contact method for delivering filamentous cargo could lead to the development of more efficient and effective microrobots for medical applications.